Task adaptation using MAP estimation in N-gram language modeling

نویسندگان

  • Hirokazu Masataki
  • Yoshinori Sagisaka
  • Kazuya Hisaki
  • Tatsuya Kawahara
چکیده

This paper describes a method of task adaptation in N-gram language modeling, for accurately estimating the N-gram statistics from the small amount of data of the target task. Assuming a task-independent N-gram to be a-priori knowledge, the N-gram is adapted to a target task by MAP (maximum a-posteriori probability) estimation. Experimental results showed that the perplexities of the task adapted models were 15% (trigram), 24% (bigram) lower than those of the task-independent model, and that the perplexity reduction of the adaptation went up to 39 % at maximum when the amount of text data in the adapted task was very small.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rapid adaptation of n-gram language models using inter-word correlation for speech recognition

In this paper, we study the fast adaptation problem of n-gram language model under the MAP estimation framework. We have proposed a heuristic method to explore inter-word correlation to accelerate MAP adaptation of n-gram model. According to their correlations, the occurrence of one word can be used to predict all other words in adaptation text. In this way, a large n-gram model can be efficien...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Adapting language models for frequent fixed phrases by emphasizing n-gram subsets

In support of speech-driven question answering, we propose a method to construct N-gram language models for recognizing spoken questions with high accuracy. Question-answering systems receive queries that often consist of two parts: one conveys the query topic and the other is a fixed phrase used in query sentences. A language model constructed by using a target collection of QA, for example, n...

متن کامل

Language and Pronunciation Modeling in the CMU 1996 Hub 4 Evaluation

We describe several language and pronunciation modeling techniques that were applied to the 1996 Hub 4 Broadcast News transcription task. These include topic adaptation, the use of remote corpora, vocabulary size optimization, n-gram cutoff optimization, modeling of spontaneous speech, handling of unknown linguistic boundaries, higher order n-grams, weight optimization in rescoring, and lexical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997